16 research outputs found

    Contact Prediction is Hardest for the Most Informative Contacts, but Improves with the Incorporation of Contact Potentials

    Get PDF
    Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contactsā€”precisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such ā€œinformativeā€ contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement

    Paper-based devices for rapid diagnosis and wastewater surveillance

    Get PDF
    Infectious diseases are a global concern for public health resulting in high rates of infection with subsequent health and socio-economic impacts through resulting morbidity and mortality. The emergence of such diseases has motivated researchers to develop cost-effective, rapid and sensitive analytical methods and devices to better understand the transmission routes of infections within populations. To this end, rapid and low-cost diagnosis and testing devices for infectious diseases are attracting increasing amounts of attention, e.g., through using paper-based analytical devices (PADs). In this paper, the recent development of PADs is critically reviewed both for the diagnosis of inviduals and population health, by using devices for testing wastewater. Finally, the review also focuses on PADs for the analysis of bacteria and viruses in wastewater, together with a discussion on thee future development of PADs for rapid diagnosis and wastewater surveillance

    Functional Genomics Annotation of a Statistical Epistasis Network Associated with Bladder Cancer Susceptibility

    Get PDF
    Background: Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. Findings: To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. Conclusions: The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies

    Diverse Convergent Evidence in the Genetic Analysis of Complex Disease: Coordinating Omic, Informatic, and Experimental Evidence to Better Identify and Validate Risk Factors

    Get PDF
    In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/validation, promote effective science communication, and guide future research directions

    Contact prediction is hardest for the most informative contacts, but improves with the incorporation of contact potentials

    Get PDF
    <div><p>Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contactsā€”precisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such ā€œinformativeā€ contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement.</p></div

    The effects of incorporating a contact potential into contact prediction.

    No full text
    <p>In plots A, C, E, and G, <i>DI</i> refers to predictions made using direct information alone. In plots B, D, F, and H, <i>MPC</i> refers to MetaPSICOVā€™s predictions alone. <i>DI</i><sub><i>CD</i></sub> and <i>MPC</i><sub><i>CD</i></sub> respectively refer to DI and MPCā€™s predictions augmented by contact degree (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0199585#pone.0199585.e005" target="_blank">Eq (3)</a>). Similarly, for <i>n</i> āˆˆ {1, 2, 3}, <i>DI</i><sub><i>n</i></sub> and <i>MPC</i><sub><i>n</i></sub> respectively refer to DI and MPCā€™s predictions augmented by contact definition <i>n</i>.</p

    Decoy-discrimination performance of <i>E</i><sub><i>CD</i></sub>, <i>E</i><sub>1</sub>, <i>E</i><sub>2</sub>, and <i>E</i><sub>3</sub> potentials (in columns CD, any-heavy, CB, and centroid, respectively) on the the I-TASSER II decoy set.

    No full text
    <p>Shown is the rank of native structure, in each sub-set, by the corresponding contact potential. The ranking of natives by <i>E</i><sub><i>CD</i></sub> is significantly better than the rankings using the other potentials, with the p-values from the Friedman test being 7.9 ā‹… 10<sup>āˆ’10</sup>, 1.3 ā‹… 10<sup>āˆ’5</sup>, and 4.5 ā‹… 10<sup>āˆ’5</sup> when comparing <i>E</i><sub><i>CD</i></sub> with <i>E</i><sub>1</sub>, <i>E</i><sub>2</sub>, and <i>E</i><sub>3</sub>, respectively.</p

    Distance-based contact definitions can flag unreasonable contact geometries or fail to capture position pairs likely to co-vary.

    No full text
    <p><b>A)</b>, <b>B)</b>, and <b>C)</b> correspond to any-heavy, <i>C</i><sub><i>Ī²</i></sub>, and centroid-based contact definitions, respectively. The top row show examples where residue pairs that would be classified as contacting, on the basis of a rather strict distance cutoff in each case, do not appear to have immediate influence on each other. Whereas the bottom row demonstrates cases where a rather loose distance cutoff, in each case, would miss an apparent contact (i.e., a pair of positions likely to co-vary). The value of the corresponding distance metric, along with the contact degree value, are shown at the bottom of each panel. Residue pairs of interest are highlighted in thick cyan sticks, with their <i>CĪ±</i> atoms shown with spheres. The contacts shown in the top row correspond to position pairs (A126, A141), (A328, A344), and (V120, V128) from PDB structures 3JUM, 3JU4, and 1LM8 for <b>A)</b>-<b>C)</b>, respectively, and those in the bottom row correspond to position pairs (A55, A62), (C102, C201), and (B144, B153) from PDB structures 1JUH, 1JUH, and 4ACF for <b>A)</b>-<b>C)</b>, respectively. These illustrative cases were identified by manual inspection of a random set of 100 PDB structures. Molecular renderings created with PyMOL.</p

    Decoy-discrimination performance of <i>E</i><sub><i>CD</i></sub>, <i>E</i><sub>1</sub>, <i>E</i><sub>2</sub>, and <i>E</i><sub>3</sub> potentials (in columns CD, any-heavy, CB, and centroid, respectively) on the Rosetta decoy set.

    No full text
    <p>Shown is the rank of native structure, in each sub-set, by the corresponding contact potential. The ranking of natives by <i>E</i><sub><i>CD</i></sub> is significantly better than ranking by the all-heavy potential (<i>E</i><sub>1</sub>), and potentials <i>E</i><sub>2</sub> and <i>E</i><sub>3</sub> performing similarly to <i>E</i><sub><i>CD</i></sub> (Friendman test p-values are 10<sup>āˆ’7</sup>, 0.17, and 0.78, respectively).</p
    corecore